Evaluating the CC-IDF citation-weighting scheme: How effectively can ‘Inverse Document Frequency’ (IDF) be applied to references?

نویسندگان

  • Joeran Beel
  • Corinna Breitinger
  • Stefan Langer
چکیده

In the domain of academic search engines and research-paper recommender systems, CC-IDF is a common citation-weighting scheme that is used to calculate semantic relatedness between documents. CC-IDF adopts the principles of the popular term-weighting scheme TF-IDF and assumes that if a rare academic citation is shared by two documents then this occurrence should receive a higher weight than if the citation is shared among a large number of documents. Although CC-IDF is in common use, we found no empirical evaluation and comparison of CC-IDF with plain citation weight (CC-Only). Therefore, we conducted such an evaluation and present the results in this paper. The evaluation was conducted with real users of the recommender system Docear. The effectiveness of CC-IDF and CC-Only was measured using click-through rate (CTR). For 238,681 delivered recommendations, CC-IDF had about the same effectiveness as CC-Only (CTR of 6.15% vs. 6.23%). In other words, CC-IDF was not more effective than CC-Only, which is a surprising result. We provide a number of potential reasons and suggest to conduct further research to understand the principles of CC-IDF in more detail.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparative Analysis of IDF Methods to Determine Word Relevance in Web Document

Inverse document frequency (IDF) is one of the most useful and widely used concepts in information retrieval. When it is used in combination with the term frequency (TF), the result is a very effective term weighting scheme (TF-IDF) that has been applied in information retrieval to determine the weight of the terms. Terms with high TF-IDF values imply a strong relationship with the document the...

متن کامل

News Recommendations using CF-IDF

Most of the traditional recommendation algorithms are based on TF-IDF, a term-based weighting method. This paper proposes a new method for recommending news items based on the weighting of the occurrences of references to concepts, which we call Concept Frequency-Inverse Document Frequency (CFIDF). In an experimental setup we apply CF-IDF to a set of newswires in which we detect 1, 167 instance...

متن کامل

Global term weights for document retrieval learned from TREC data

A key element in modern text retrieval systems is the weighting of individual words for importance. Early in the development of document retrieval methods it was recognized that performance could be improved if weights were based at least in part on the frequencies of individual terms in the database. This observation led investigators to propose inverse document frequency weighting, which has ...

متن کامل

Inverse Category Frequency based supervised term weighting scheme for text categorization

Term weighting schemes often dominate the performance of many classifiers, such as kNN, centroid-based classifier and SVMs. The widely used term weighting scheme in text categorization, i.e., tf.idf, is originated from information retrieval (IR) field. The intuition behind idf for text categorization seems less reasonable than IR. In this paper, we introduce inverse category frequency (icf) int...

متن کامل

TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users’ Personal Document Collections

TF-IDF is one of the most popular term-weighting schemes, and is applied by search engines, recommender systems, and user modeling engines. With regard to user modeling and recommender systems, we see two shortcomings of TF-IDF. First, calculating IDF requires access to the document corpus from which recommendations are made. Such access is not always given in a user-modeling or recommender sys...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016